I Summary of Best Ti and Td Vq Performance Speaker Recognition Using Hidden Markov Models, Dynamic Time Warping and Vector Quantisation
نویسندگان
چکیده
1 Illustration of the segmentation of the database collected over a period of three months into training and 3 %Error against total number of mixtures for TI ergodic CDHMMs (10 version training) 7 %Error against the number of training versions for a TI 32 element VQ, and 32 mixture single state CDHMM 11 8 %Error against the number of training versions for TD DTW, 8 element VQ and 1 mixture 8 state CDHMM 11 9 DTW text-dependent digit Abstract This paper evaluates continuous density hidden Markov models (CDHMM), dynamic time warping (DTW) and distortion-based vector quantisation (VQ) for speaker recognition, emphasising the performance of each model structure across incremental amounts of training data. Text-independent (TI) experiments are performed with VQ and CDHMMs, and text-dependent (TD) experiments are performed with DTW, VQ and CDHMMs. We show for TI speaker recognition, VQ performs better than an equivalent CDHMM with one training version, but is outperformed by CDHMM when trained with ten training versions. For TD experiments we show that DTW outperforms VQ and CDHMMs for sparse amounts of training data, but with more data, the performance of each model is indistinguishable. The performance of the TD procedures is consistently superior to TI, which is attributed to subdividing the speaker recognition problem into smaller speaker-word problems. We also show a large variation in performance across the diierent digits, concluding that digit zero is the best digit for speaker discrimination.
منابع مشابه
Speaker recognition models
This paper evaluates continuous density hidden Markov models (CDHMM), dynamic time warping (DTW) and distortion-based vector quantisa-tion (VQ) for speaker recognition, across incremen-tal amounts of training data. In comparing VQ and CDHMMs for text-independent (TI) speaker recognition , it is shown that VQ performs better than an equivalent CDHMM with one training version, but is outperformed...
متن کاملFeature Extraction and Classification for Automatic Speaker Recognition System – A Review
Automatic speaker recognition (ASR) has found immense applications in the industries like banking, security, forensics etc. for its advantages such as easy implementation, more secure, more user friendly. To have a good recognition rate is a pre-requisite for any ASR system which can be achieved by making an optimal choice among the available techniques for ASR. In this paper, different techniq...
متن کاملSpeaker Recognition with Small Training Requirements Using a Combination
Vector Quantisation (VQ) has been shown to be robust in speaker recognition systems which require a small amount of training data. However the conventional VQ-based method only uses distortion measurements and discards the sequence of quantised codewords. In this paper we propose a method which extends the VQ distortion method by combining it with the likelihood of the sequence of VQ indices ag...
متن کاملComparison of Vq and Dtw Classifiers for Speaker Verification
An investigation into the relative speaker verification performance of various types of vector quantisation (VQ) and dynamic time warping (DTW) classifiers is presented. The study covers a number of algorithmic issues involved in the above classifiers, and examines the effects of these on the verification accuracy. The experiments are based on the use of a subset from the Brent (telephone quali...
متن کاملA Proportional Study on Feature Extraction Method in Automatic Speech Recognition System
Automatic speech recognition (ASR) has been the focus of many researchers for several years. In speech recognition system is for a computer be able to "hear,” understand," and "act upon" spoken information. The speaker recognition system viewed as working in a Analysis , Feature extraction , Modeling , Testing/Matching techniques .speech processing is to convey information about words, speaker ...
متن کامل